Novel elicitation and annotation schemes for sentential and sub-sentential alignments of bitexts

نویسندگان

  • Yong Xu
  • François Yvon
چکیده

Resources for evaluating sentence-level and word-level alignment algorithms are unsatisfactory. Regarding sentence alignments, the existing data is too scarce, especially when it comes to difficult bitexts, containing instances of non-literal translations. Regarding word-level alignments, most available hand-aligned data provide a complete annotation at the level of words that is difficult to exploit, for lack of a clear semantics for alignment links. In this study, we propose new methodologies for collecting human judgements on alignment links, which have been used to annotate 4 new data sets, at the sentence and at the word level. These will be released online, with the hope that they will prove useful to evaluate alignment software and quality estimation tools for automatic alignment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Sub-Sentential Alignment of Phrase-Structure Trees

Data-Oriented Translation (DOT), based on DataOriented Parsing (DOP), is a language-independent MT engine which exploits parsed, aligned bitexts to produce very high quality translations. However, data acquisition constitutes a serious bottleneck as DOT requires parsed sentences aligned at both sentential and sub-structural levels. Manual substructural alignment is time-consuming, error-prone a...

متن کامل

The Effect of Intra-sentential, Inter-sentential and Tag- sentential Switching on Teaching Grammar

The present study examined the comparative effect of different types of code-switching, i.e., intrasentential,inter-sentential, and tag-sentential switching on EFL learners grammar learning andteaching. To this end, a sample of 60 Iranian female and male students in two different institutionsin Qazvin was selected. They were assigned to four groups. Each group was randomly assigned toone of the...

متن کامل

Robust Language Pair-Independent Sub-Tree Alignment

Data-driven approaches to machine translation (MT) achieve state-of-the-art results. Many syntax-aware approaches, such as ExampleBased MT and Data-Oriented Translation, make use of tree pairs aligned at sub-sentential level. Obtaining sub-sentential alignments manually is time-consuming and error-prone, and requires expert knowledge of both source and target languages. We propose a novel, lang...

متن کامل

Linguistically-Based Sub-Sentential Alignment for Terminology Extraction from a Bilingual Automotive Corpus

We present a sub-sentential alignment system that links linguistically motivated phrases in parallel texts based on lexical correspondences and syntactic similarity. We compare the performance of our subsentential alignment system with different symmetrization heuristics that combine the GIZA++ alignments of both translation directions. We demonstrate that the aligned linguistically motivated p...

متن کامل

Two Tools for Creating and Visualizing Sub-sentential Alignments of Parallel Text

We present two web-based, interactive tools for creating and visualizing sub-sentential alignments of parallel text. Yawat is a tool to support distributed, manual wordand phrase-alignment of parallel text through an intuitive, web-based interface. Kwipc is an interface for displaying words or bilingual word pairs in parallel, word-aligned context. A key element of the tools presented here is t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016